Skip to content

[pull] main from triggerdotdev:main#209

Merged
pull[bot] merged 3 commits into
Dustin4444:mainfrom
triggerdotdev:main
Jun 11, 2026
Merged

[pull] main from triggerdotdev:main#209
pull[bot] merged 3 commits into
Dustin4444:mainfrom
triggerdotdev:main

Conversation

@pull

@pull pull Bot commented Jun 11, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )


This change is Reviewable

## Summary

Adds a second backend for the realtime runs feed (`useRealtimeRun`,
`subscribeToRunsWithTag`, `subscribeToBatch`), built to stay healthy
when a single busy environment has many subscribers watching many runs
at once. It is gated behind a feature flag with the existing backend as
the default, so nothing changes for users until it is enabled per
environment.

## Design

A run change is published once, as a small self-describing record, to a
single per-environment channel. Every feed is then a predicate over that
one stream rather than owning a channel:

- A per-instance router indexes the currently-held feeds by run, tag,
and batch. When a run changes it hydrates the affected rows once and
serializes them once, then fans the result to every matching feed. One
hot shared tag watched by many subscribers costs a single database query
and serialize, not one per subscriber.
- Feeds that don't match a change are never woken, wake delivery per
environment is coalesced on a leading edge (250ms default) so a burst of
changes costs one wake, and cold reads coalesce onto a single
short-TTL-cached resolve.
- An admission gate bounds how many cold ClickHouse resolves run
concurrently, so a mass reconnect across many distinct filters queues
instead of stampeding the database.
- Changes that land while a client is between long-polls are delivered
on its next poll instead of waiting for the periodic backstop: each
environment buffers its recent change records, subscriptions linger
briefly after the last feed closes, and a newly-armed poll replays
exactly the connection's gap.
- The per-connection replay cursors behind that are shared across
instances via Redis (a single timestamp each), so a poll landing on a
different instance behind the load balancer still reads the connection's
true gap instead of falling back to a cold resolve. Cursor reads have a
bounded deadline and degrade to the cold-read path on any Redis trouble.
- Tag subscriptions with multiple tags match runs carrying all of the
tags, mirroring the existing backend's filter semantics, and live
long-polls hold for about 20 seconds to match its cadence.
- The per-environment channel supports Redis Cluster sharded pub/sub, so
the wake path scales horizontally across shards by environment.
- The backend reports its health through OpenTelemetry metrics (delivery
lag, poll resolution paths, backstop outcomes, replay and cursor-store
activity), with a provisioned Grafana dashboard for local development.

Everything is behind the feature flag and tunable via env vars; the
existing backend remains the default.
## Summary

Reliability and authorization fixes for realtime chat sessions:

- Session-stream waitpoint delivery is scoped to the environment, so two
environments using the same session `externalId` can no longer complete
each other's waitpoints.
- The session snapshot-url routes now enforce per-session authorization,
and appending to a session's `out` channel requires secret-key auth, so
a session-scoped token can't read another session's snapshot or forge
assistant output.
- Appends that carry an `X-Part-Id` header are deduplicated on retry, so
a retried send can't duplicate a message.
- Session creation rejects expired sessions (instead of triggering a run
that can never receive input), `externalId` is immutable after creation,
and the sessions list endpoint returns friendly `run_*` ids to match the
single-session routes.

## Rollout

The waitpoint cache key gains an environment prefix. To keep waitpoints
registered by the previous deploy working across the boundary, the drain
reads both the new and the previous key for this release; the legacy
read can be removed a release later once no pre-deploy waitpoints
remain.
…3891)

## Summary

A batch of reliability fixes for `chat.agent`:

- A user message sent while the agent is streaming is no longer
delivered twice (which could run a duplicate turn).
- Input appends carry an idempotency key (`X-Part-Id`) so a retried send
can't duplicate a message.
- `onTurnComplete` now fires on errored turns with the thrown error
attached, and the failed turn's user message is persisted so it isn't
lost on the next run.
- Stopping a generation clears the streaming state, so a page reload
doesn't replay the stopped turn.
- Custom agents and manual `chat.writeTurnComplete` callers trim the
output stream, sending a custom action no longer leaves a second stream
reader running, a long-lived `watch` subscription no longer grows its
dedupe set without bound, promoting a queued message to steering no
longer risks a double-send, and runs keep the full set of dashboard
tags.

The `X-Part-Id` header is accepted by current servers (they just don't
dedupe on it yet), so this is safe to ship ahead of the matching server
change.
@pull pull Bot locked and limited conversation to collaborators Jun 11, 2026
@pull pull Bot added the ⤵️ pull label Jun 11, 2026
@pull pull Bot merged commit f5f29ce into Dustin4444:main Jun 11, 2026
1 check was pending
@pull pull Bot had a problem deploying to dependabot-summary June 11, 2026 12:10 Failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant